Evaluation of scoring functions for protein multiple sequence alignment using structural alignments
نویسندگان
چکیده
The process of aligning a group of protein sequences to obtain a meaningful Multiple Sequence Alignment (MSA) is a basic tool in current bioinformatic research. The development of new MSA algorithms raises the need for an efficient way to evaluate the quality of an alignment, in order to select the best alignment among the ones produced by the available algorithms. A natural way to evaluate the quality of alignments is by the use of scoring functions, which assigns for each alignment a number reflecting its quality. Different scoring functions for MSA were proposed over the years, which raised the need for methodological ways to asses the quality of such functions. Few methods for assessing the quality of scoring functions for pairwise alignments were proposed. These methods are based on comparing alignments which are optimal for a given scoring function to structural alignments (alignments obtained through analysis of the 3 dimensional structures of related proteins). A main obstacle in using the above methods for evaluating scoring functions for alignments of k > 2 sequences is the unavailability of efficient algorithms for computing optimal alignments (for a given scoring function) of even moderate number of sequences. We propose a framework for bypassing this difficulty, which is based on computing the correlation between suboptimal alignments. An inherent issue that needs to be addressed in our method is the identification of an appropriate sample set of alignments to be used in the correlation test. We describe this problem, suggest a solution and report results using this solution. Our results indicates that for most scoring functions, the addition of appropriate gap penalties improves the quality of the function. One notable exception is COFFEE, for which the average improvement after adding gap penalties was negligent in all of our experiments. COFFEE was also the best function in the average quality for the entire benchmark tested. Notations and Abbreviations MSA – Multiple Sequence Alignment NW – Needleman-Wunch algorithm/scoring function SoP – Sum of Pairs scoring function CoG – Center of Gravity scoring function
منابع مشابه
Profile alignment scoring functions A comparison of scoring functions for protein sequence profile alignment
Motivation: In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSIBLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTAL...
متن کاملA comparison of scoring functions for protein sequence profile alignment
MOTIVATION In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTAL...
متن کاملQUASAR - scoring and ranking of sequence-structure alignments
SUMMARY Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult pro...
متن کاملImproving Profile-Profile Alignments via Log Average Scoring
Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignmen...
متن کاملAligning Protein Sequences with Predicted Secondary Structure
Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequence...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006